Using Network Function to Define and Identify Community Structure 
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We investigate a functional definition of community structure in complex networks. In particular, we consider 
networks whose function is enhanced by the ability to synchronize and/or by resilience to node failures. Pre- 
vious work has shown that the largest eigenvalue of the network's adjacency matrix provides insight into both 
synchronization and percolation processes. Thus, for networks whose goal is to perform these functions, we 
propose a method that divides a given network into communities based on maximizing the largest eigenvalues 
of the adjacency matrices of the resulting communities. 
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I. INTRODUCTION 

Complex networks have received great attention from the 
physics community. Networks serve as models for under- 
standing properties of many real complex systems. They pro- 
vide insight into the dynamical behavior and functional at- 
tributes of such systems. Over the last decade, interest in net- 
works has grown substantially, partly spurred by the discovery 
of previously under-appreciated properties seen in real-world 
networks, e.g. small world behavior [1], scale-free degree dis- 
tribution J2l, assortative mixing JH, etc. The properties of 
networks have been studied at all levels ranging from micro- 
scopic to mesoscopic to global. At the mesoscopic level, one 
potentially important property of networks is the community 
structure. A community can be defined as a group of net- 
work nodes that interact more strongly with each other than 
with nodes outside their community. Community structure 
has been shown to exist in many real networks Jj, |j, la 0, HD • 
Such structures can have significant influence on the struc- 
ture and dynamics of the network as a whole. For example, 
communities might be substructures that represent functional 
units, as in some biological systems |9|]. 

In Ref. (lOfl, it was found that for directed networks with- 
out community structure, a larger A* makes the network more 
resilient to breaking up into many disconnected pieces when 
nodes are randomly removed (e.g., due to failure or attack). 
Equivalently, larger A* results in an earlier percolation phase 
transition. Also, certain types of synchronizing collective dy- 
namical behavior are promoted by increasing A* [lj]]. Thus, 
if a network's function depends on synchronizing or robustly 
maintaining connectivity, then looking at the largest eigenval- 
ues of the adjacency matrices of communities may provide a 
natural basis for a useful definition of community structure 
on such networks. This provides the initial motivation for the 
method presented in this paper. 

Another issue is that it has been a common practice to apply 
algorithms designed for finding communities in undirected 
networks to directed networks by simply neglecting the direc- 
tionality of links. As shown by Leicht and Newman fl2il . such 



a procedure may produce misleading results. As in Ref. \Y± 
the method proposed in our paper is appropriate for finding 
community structure in directed networks, although it can be 
used for finding communities in undirected networks as well. 
Our method assigns each nodes into individual communities 
and thus does not consider overlapping community structures. 

The organization of this paper is as follows. In section ITT1 
we review the relation between the largest eigenvalue of the 
adjacency matrix of networks without community structure 
and their functional properties. In Section [TTTJ we define an 
eigenvalue based measure that can be used to determine com- 
munity structure in networks. In Section HV1 we briefly de- 
scribe the method we use to detect the community structure 
given our functional defintion. In Section [VJ we give results 
for our method and compare it with modularity based method. 
Finally, in Section I VII we discuss open questions and direc- 
tions for future work. 



II. NETWORK FUNCTIONS AND THE LARGEST 
EIGENVALUE OF THE ADJACENCY MATRIX 

The largest eigenvalue of the adjacency matrix of networks 
without community structure can be used to characterize syn- 
chronization and percolation phenomenon. Below, we discuss 
the significance of the largest eigenvalue of network adjacency 
matrix for these network functions. 



A. Synchronization 

Reference [11] considers synchronization in directed net- 
works with the following scheme 



N 
3 = 1 



(1) 
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where 9{ and cjj are the phase and intrinsic frequency of node 
i, K is the coupling constant, N is the number of nodes in the 
network. Here, Aij is the (i,j) th entry of the adjacency ma- 
trix which has value 1 if there is a link from node j to node i, 
otherwise it is 0. When the nodes are allowed to synchronize, 
there is a critical coupling constant below which there is no 
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synchronization of nodes and above which nodes synchronize 
spontaneously. This critical value of coupling constant, K c , 
depends on the largest eigenvalue, A*, of the network adja- 
cency matrix ifTHl . 



K Q 

X c = - 

A* 



(2) 



where Kq is a constant which depends on distribution of os- 
cillator frequencies and is independent of the network charac- 
teristics. Finite or infinite population? 

The synchronization of nodes in the networks can be char- 
acterized by the magnitude of the global complex valued order 
parameter, r, given by 



E 



N 



N 



(3) 



where < ... > t denotes the time average. The coherent be- 
havior of the system is signified by the non-zero value of r. 

Thus, as seen above higher the largest eigenvalue, earlier 
the phase transition. 
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FIG. 1: The adjacency matrix of a network with g number of com- 
munities in block matrix form. Each diagonal block corresponds to 
the adjacency matrix of a community, while the off diagonal blocks 
correspond to links between communities. 



1. Consider a partition of a network into g communities. 

2. Then break all links between communities. 



B. Percolation 

The percolation transition in case of random removal of 
nodes happen when IToll 

Pc = 1 - ±- (4) 

A* 

fraction of nodes have been randomly removed from the net- 
work. The result applies to local tree like networks. Thus, 
larger the eigenvalue of the network, more the number of links 
that needs to be removed to disintegrate the network. 

III. A FUNCTIONAL DEFINITION OF COMMUNITY 
STRUCTURE USING EIGENVALUES 

Motivated by the role played by A* in synchronization and 
resilience in networks (Section|II]l, we propose a measure that 
quantifies the strength of network division into communities 
that have better synchronizability and robustness to random 
node failures. We view this as an example of a functional 
definition of communities that might be appropriate in some 
cases, but we also emphasise that other definitions could be 
better for other purposes. 

For clarity, we can write the adjacency matrix, A, of net- 
works with community structure in block matrix form as 
shown in Fig[T] Each diagonal block of A then correspond 
to the adjacency matrix of an individual community while the 
off diagonal blocks corresponds to the links between commu- 
nities. We propose that, given a network, if we can find a parti- 
tion of the network into communities that have higher largest 
eigenvalues of their corresponding adjacency matrices, then 
those communities will have enhanced network functions. 

The definition of community structure that we study is as 
follows: 



3. Calculate the maximum eigenvalues (A»i, A*2, ■■■> A* g ) 
of the adjacency matrices of each isolated community. 

4. Define a spectral cohesion function: 

g 

A s = ^/n(A*jfc). (5) 
fc=i 

The spectral cohesion function, A g , provides a functionally 
based measure of the community strength of a particular par- 
titioning of the network. We can thus define the best division 
into g communities as the one that maximizes A g , where we 
think of best as being with respect to the enhancement of syn- 
chronization or resilience. 

While the method of maximizing the spectral cohesion 
function, A g , gives us the best division into g communities, 
it does not tell us how to choose the appropriate value of g, 
i.e. the natural number of communities that the network con- 
tains. We address this problem in another paper fl3ll . using 
the eigenspectra of the adjacency matrix of the full network. 
There we show that the g* largest eigenvalues, where g* is ap- 
propriate choice of g, are well-separated from the remaining 
eigenvalues. 

As an aside, we emphasize that our choice of (|5]l is some- 
what arbitrary and A g = /(A*fe) for any function / that 
is monotonically increasing (e.g., /(A) = X^,/3 > 0) might 
alternatively be considered. However, we shall, in all of what 
follows use /(A) = ln(X). This is partly motivated by the 
analogy to entropy, and by our studies with /(A) = X", for 
(3 = 1 and 2, which did not yield results qualitatively differ- 
ent from /(A) = ln(X). 

This definition of community structure can be used for both 
symmetric and asymmetric matrices. In Section [V] we will 
demonstrate its utility for directed networks in particular. 
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IV. DETECTING FUNCTIONAL COMMUNITIES 

Thus far, we have proposed a quantity whose maximization 
should yield the best division of a network into communities 
for the network functions we are interested in, but we have not 
specified how to identify this partitioning. In this section, we 
provide an outline for a simulated annealing procedure ifTill 
that finds the appropriate division of the network. The ad- 
vantage of this method is that it can provide a network divi- 
sion whose spectral cohesion function is very close to the true 
maximal value. The disadvantage is that it is computationally 
very intensive. In order to fairly compare our results with the 
modularity approach, we also perform simulated annealing on 
the modularity function. This measure rewards links between 
nodes in the same community when they are higher than ex- 
pected in a random network with no community structure. For 
directed networks, modularity (Q) is defined as lfl2ll 

Q = ^ E i A v ~ dTd° ut /m] S Ci>Cj , (6) 

where Ay is the (i,j) th entry in adjacency matrix, df 1 de- 
notes in-degree of node i, d° ut denotes out-degree of node j. 
Subscripts q and Cj denote community indices of nodes i and 
j- 

In our simulated annealing scheme, we begin by assigning 
nodes randomly to g* different communities, where we find 
g* as described above. We then choose a node at random and 
pick a random community, apart from its own, to which to 
consider moving it. If this move would result in an increase 
in the value of the function we consider (whether A g , or Q), 
we always perform the move. If the move would result in 
a decrease in the value of the function, we perform it with 
Boltzmann acceptance probability e AF / T , where AF is the 
decrease of the function F to be maximized and T is the tem- 
perature. In other words, we follow the Metropolis algorithm. 
For each temperature value, we do this for aN 2 iterations, 
where N is the number of nodes in the network. After ex- 
perimentation with several values, we used a = 0.2 for our 
results. After aN 2 iterations, we reduced the temperature by 
a factor of 0.98. We repeat this until we reach an asymptotic 
value of A g (or Q). 

V. RESULTS 

In this section, we show the results for the method proposed 
in this paper applied to some real and artificial networks. As a 
comparison, we also show the results for the modularity based 
method using simulated annealing. We tested synchronizabil- 
ity, resilience to random node failures and percent of nodes 
classified correctly for the communities obtained using the 
two methods. The method proposed in this paper is expected 
to give the division of communities in which the two func- 
tional characteristics are enhanced. The computer generated 
directed scale-free (with 7=2.5) and Erdos-Renyi type net- 
works with communities, used in this section, are constructed 
using the methods given in Ref. lfl3ll . 



1 




FIG. 2: (Color online) Magnitude of the average order parameter, 
r, versus the coupling constant, K, for the computer generated di- 
rected scale-free and Erdos-Renyi type networks with 500 nodes and 
two equal sized communities. Plots shown for the two communities 
combined. The error bars are too small to be significant. 

A. Synchronization 

For all the networks used in this subsection, for synchro- 
nization, we consider the weights of between community links 
to be zero. Thus, the communities synchronize independently 
of each other. In case of communities that have larger A*fe, 
synchronization is expected to start at lower values of K com- 
pared to the ones that have smaller A*fc. 

The directed networks considered in this subsection have 
500 nodes, each with two equal sized communities. The net- 
works have < d >= 9, where < d > for a directed net- 
work here denotes average in-degree/out-degree, which are 
both equal. The in-degree and out-degree at a node are as- 
signed independently at random thus the in/out degrees at the 
nodes are uncorrected. On an average, out of the 9 in/out 
links that connect to a node, 6 links connect the node to other 
nodes in its own community while the rest 3 links connect it 
to nodes outside its community. 

For synchronization within each individual community, we 
used the same scheme as in Eq.(Q} with all the oscillators now 
belonging to community of interest and not to the whole net- 
work. The natural frequencies of oscillators are assumed to be 
distributed normally with zero mean and unit standard devia- 
tion. The combined result for the synchronization of nodes in 
two communities for both directed scale-free and Erdos-Renyi 
type network are shown in Figf2] The combined result for the 
whole network was calculated by using the weighted average, 

JVin + N 2 r 2 
Ni+N 2 ' 

as our order parameter. Here, r\ and r 2 are the average or- 
der parameters of the two communities for a given coupling 
constant, and N\ and N 2 are the number of nodes in the two 
communities. In Fig. [2] each data point is averaged over 20 
different networks and for each network, we averaged r-y and 
r 2 over 20 independent runs. 
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Optimizing A 9 


Optimizing Q 


Networks 


% correct 


A 9 


Q 


% correct 


A 9 


Q 


Scale-free 


73.1 ± 3.4 


3.688 ± 0.08 


0.138 ±0.010 


86.2 ±2.1 


3.590 ± 0.09 


0.187 ±0.004 


Erdos-Renyi type 


84.5 ± 3.0 


3.644 ± 0.03 


0.177 ±0.005 


87.2 ± 2.9 


3.621 ± 0.03 


0.187 ±0.004 



TABLE I: Function values and percent nodes classified correctly for the networks considered in this section. The results are averaged over 20 
different realizations for each type of network. The errors shown are the standard errors. 



In both types of networks, on an average, the largest eigen- 
values of the adjacency matrices of the communities obtained 
using the spectral cohesion method was slightly higher than 
the communities obtained using modularity based method. In 
this case, since the difference in the eigenvalues for the two 
methods was not too large, we find that the transition to co- 
herence occurs at the same location. 

In terms of percent of nodes classified correctly, modular- 
ity did better than spectral cohesion function (TableQ} for both 
types of networks. For the communities obtained by maximiz- 
ing Kg and Q, there were approximately 376 ± 22 common 
nodes among communities for the scale-free and 440 ±16 
common nodes among communities for the Erdos-Renyi type 
directed networks. 

Networks are small. Bigger networks {N = 1400) give 
expected results with slightly earlier transition. 



B. Percolation 



O 
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FIG. 3: (Color online) The relative size of the sum of GSCC vs the 
fraction of nodes removed for the computer generated scale-free and 
Erdos-Renyi type directed networks. The networks have 500 nodes 
each with two equal sized communities. Y-error bars correspond to 
standard error. 



As discussed in section|IIJ the largest eigenvalue of network 
adjacency matrix is related to percolation transition. When a 
network is attacked, it disintegrates when sufficient number of 
nodes have been removed. This is signified by the disappear- 
ance of the giant strongly connected component (GSCC). To 
test the resilience of communities obtained using the spectral 
cohesion method and the modularity method, we used same 
networks as considered in the previous subsection. We tested 
the resilience of network communities to random node fail- 
ures. 

Figure [3] shows the combined result for the two commu- 
nities for the scale-free and Erdos-Renyi type directed net- 
works. At each step of the node removal process, we remove 
a node randomly from the network and find the sum of the 
sizes of the GSCCs of both the communities. In Fig. [3j we 
plot the relative size of the sum of GSCCs versus the fraction 
of nodes removed in the network. The relative size of the sum 
of GSCCs is defined as the ratio of the sum of the sizes of 
the giant components of the two communities to the sum of 
the number of nodes present in both the communities during 
attack. We obtained curves for both the scale-free and Erdos- 
Renyi type directed networks by averaging over 20 different 
networks with 20 different node removal processes for each 
network. We do not find any difference in the results obtained 
using the spectral cohesion based method and the modularity 
method. This indicates that for these network, the eigenval- 
ues of communities obtained using the two methods were too 
close to make any significant difference for percolation. 



C. Other examples 

Here, we compare our method with modularity for the per- 
cent of nodes classified correctly for some computer generated 
networks. As a first example, we consider directed networks 
with 32 nodes and two communities of equal sizes. To gener- 
ate these networks, we start with Erdos-Renyi undirected net- 
works with 32 nodes and no community structure. We then 
divide the nodes into two equal sized communities. Within 
communities, all the undirected links are made directed with 
probability 0.5. The undirected between community links are 
made directed with a bias such that we have more number of 
links pointing from one community to the other than the other 
way round. In these networks, there were 50 directed links 
within each community and a total of 100 directed links be- 
tween communities. So when we have x number of directed 
links pointing from one community to the other, 100 — x num- 
ber of links point in the opposite direction. When x = 50 we 
have equal number of links pointing in both the directions. 
However, this will be the case of a directed network where we 
do not have any community structure. Varying x gives us net- 
works with varying degree of community strength. We regard 
the networks in these cases as having community structure in 
the sense that nodes within communities are mutually effect- 
ing each other while outside their communities the influence is 
biased and is directed more from one community to the other. 
The results for this case are shown in Fig0] At low values 



5 




10 20 30 40 50 

Number of edges from community A to community B 



FIG. 4: (Color online) Percent correct vs 'x' for computer generated 
directed networks as explained in the text. The network has 32 nodes 
and two communities of equal sizes. All data points are averaged 
over 100 runs. 
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FIG. 5: (Color online) Percent correct vs z ou ± for computer generated 
directed networks with 128 nodes and four communities. The error 
bars are smaller than the symbol sizes. 



of x, when we have more bias, the spectral cohesion function 
does better than modularity. At relatively higher values of x, 
both functions give similar results. 

When the directionality of links is neglected, the above net- 
works give undirected networks with no community structure. 
But the results seen in Figj4] suggest that there exist commu- 
nity structure in these networks emphasizing the fact that, in 
general, the methods used for undirected networks might not 
be suitable for directed networks. 

Figure |5] shows the result for the Erdo-Renyi type directed 
networks with 128 nodes and < d >= 16. The networks 
have four communities of equal sizes. We show the results for 
the percent of nodes classified correctly with varying average 
number of between community in/out links in the network, 
z ou t, keeping < d > constant. Each data point in the plot 
is an average over 100 network realizations. As can be seen, 
spectral cohesion function and modularity give similar results 
with modularity doing slightly better at higher values of Zout- 





Optimizing A 9 


Optimizing Q 




Network 


A 9 


Q 


A 9 


Q 


% common nodes 


Jazz bands 


10.095 


0.441 


10.084 


0.444 


96.0 


Political blogs 


6.817 


0.416 


6.816 


0.431 


94.7 



TABLE II: Function values and percent nodes different for the real 
networks considered in this section. 



D. Discovering communities in real world networks 

We also tested our method on some real networks. The 
real networks considered are the network of political blogs 
10] and the network of jazz bands H]. Political blogs network 
is a directed network of weblogs on US politics during the 
2004 US presidential elections. The edges are the hyperlinks 
connecting two blogs. The data for the network of jazz bands 
was obtained from The Red Hot Jazz Archive digital database. 
This network consists of bands that performed between 1912 
and 1940. In this network, two bands are connected if there is 
a musician who has played in both the bands. 

The political blogs network has 1224 nodes with < d >= 
15.6. The eigenvalue plot lfl3tl of the adjacency matrices of 
this network shows two eigenvalues well separated from the 
cloud of the rest of the eigenvalues. This implies that there 
are two well defined communities in this network. The com- 
munities correspond to left/liberals and right/conservatives. 
We performed simulated annealing procedures for dividing 
the network into two communities. The results for the func- 
tion values are shown in Table [II] which also gives the per- 
cent of nodes common for the spectral cohesion method and 
the modularity method. The optimized values of A g for the 
spectral cohesion method and the modularity method are very 
close. Despite the two well defined communities, the Q value 
for our method is slightly lower than the modularity method. 
This might be because in this network there were approx- 
imately 394 nodes that do not belong to any communities 
strongly connected component. Thus in the spectral cohesion 
method, they were assigned based on the number of links such 
nodes have to other nodes which could be problematic for our 
method if the giant in/out components are relatively larger. Of 
the nodes that belonged to the strong components, there were 
97.2 percent common nodes among the two community find- 
ing methods. 

The network of jazz bands is an undirected network with 
198 nodes and < d >= 27.7. The eigenvalue plot of this net- 
work shows three eigenvalues well separated from the bulk 
of the eigenvalue cloud which indicates three strong com- 
munities 111 311 . The two strong communities in this network 
correspond to whites and blacks which shows racial segrega- 
tion. The community of black musicians divides further into 
groups that performed in two major US cities, Chicago and 
New York [8]. Figure [6] shows the comparison of the spec- 
tral cohesion method and the modularity method for the jazz 
bands network. 
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FIG. 6: (Color online) Comparison of the spectral cohesion method 
and the modularity method for the jazz bands networks. Different 
shapes of the nodes correspond to communities obtained by maxi- 
mizing spectral cohesion function, while different colors correspond 
to communities obtained by maximizing modularity. 



VI. DISCUSSION AND CONCLUSIONS 



The spectral cohesion approach given in this paper for find- 
ing communities could be useful when we are interested in the 
functional characteristics of the network. Modularity is a defi- 
nition based on the topological characteristics of the network. 
The example case of FigH]shows clear distinction between the 
spectral cohesion method and the modularity method. 

We proposed a definition of network communities which is 
designed to be suited for networks where synchronization and 
percolation are important. For the networks considered in this 
paper, we did not show much improvement over modularity 
because the largest eigenvalues obtained using the two meth- 
ods were not too different. Despite this, we experimented with 
this method because of its conceptual interest. The method we 
proposed here should be taken as an alternate definition to the 
existing methods in use today. 
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